The Role of Knowledge-based Features in Polarity Classification at Sentence Level
نویسندگان
چکیده
Though polarity classification has been extensively explored at document level, there has been little work investigating feature design at sentence level. Due to the small number of words within a sentence, polarity classification at sentence level differs substantially from document-level classification in that resulting bag-of-words feature vectors tend to be very sparse resulting in a lower classification accuracy. In this paper, we show that performance can be improved by adding features specifically designed for sentence-level polarity classification. We consider both explicit polarity information and various linguistic features. A great proportion of the improvement that can be obtained by using polarity information can also be achieved by using a set of simple domainindependent linguistic features. Introduction One of the most popular subtasks of opinion mining is polarity classification, i.e. the task of distinguishing between positive and negative utterances. This task has been extensively explored at document level but there has only been comparatively little work at sentence level although the task is an established research problem (Matsumoto, Takamura, and Okumura 2005; Meena and Prabhabkar 2007). Sentiment information is not evenly distributed across a document. Not only do documents usually comprise both subjective and factual sentences but also the polarity of subjective sentences within a document varies. Thus, sentencelevel classification can be used to improve document-level classification (McDonald et al. 2007). Moreover, for tasks demanding fine-grained text analyses, such as text summarization, sentiment classification at sentence level seems more appropriate than document classification. Due to the small number of words within a sentence, polarity classification at sentence level differs substantially from document-level classification in that resulting feature vectors encoding sentences tend to be much sparser. Therefore, a classifier trained on bag of words performs worse than at document level. Fortunately, there is a plethora of linguistic features by which a word can be described within a sentence. We consider features, such as part-of-speech information, clause Copyright © 2009, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. types, depth of word constituents, or WordNet hypernyms. At document level, these features have hardly been used. In general, the benefit of these features remains controversial since their extraction is computationally expensive (many of these features require linguistic pre-processing such as part-of-speech tagging or even syntactic parsing) and their contribution in terms of performance is fairly limited since bag-of-words classifiers already pose a robust baseline. We show that explicit polarity information and a set of simple linguistic features can significantly improve a standard bag-of-words classifier. The additional insight that a standard classifier can be improved by linguistic features in the absence of any polarity information might be useful for situations in which no domain knowledge is available since polarity information is domain-dependent to a great extent. We consider polarity classification as a binary classification task. That is we assume that each sentence to be classified is subjective. We neglect the distinction between objective and subjective content since this classification is usually solved independently (Pang and Lee 2004; Ng, Dasgupta, and Arifin 2006). Our experiments are carried out on a subset of the MPQA corpus (Wiebe, Wilson, and Cardie 2003).
منابع مشابه
Feature extraction in opinion mining through Persian reviews
Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels d...
متن کاملImproving Sentimental Classifications Using Contextual Sentences
This paper presented a new methodology, which helps improve the accuracy of sentimental polarity classification. Unlike most prior works which focused on lexical features at the word level, the methodology presented here attempts to include more contextual information by focusing on the sentence level. This paper proposes the following process: (1) train a classifier using word-level lexical fe...
متن کاملبرچسبزنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه
Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...
متن کاملIranian EFL Learners’ Lexical Inferencing Strategies at Both Text and Sentence levels
Lexical inferencing is one of the most important strategies in vocabulary learning and it plays an important role in dealing with unknown words in a text. In this regard, the aim of this study was to determine the lexical inferencing strategies used by Iranian EFL learners when they encounter unknown words at both text and sentence levels. To this end, forty lower intermediate students were div...
متن کاملClassification of Inconsistent Sentiment Words using Syntactic Constructions
An important problem in sentiment analysis are inconsistent words. We define an inconsistent word as a sentiment word whose dictionary polarity is reversed by the sentence context in which it occurs. We present a supervised machine learning approach to the problem of inconsistency classification, the problem of automatically distinguishing inconsistent from consistent sentiment words in context...
متن کامل